--- title: "Using a delay-adjusted case fatality ratio to estimate under-reporting" description: "Using a corrected case fatality ratio, we calculate estimates of the level of under-reporting for any country with greater than ten deaths" status: real-time-report rmarkdown_html_fragment: true update: 2020-05-28 authors: - id: tim_russell corresponding: true - id: joel_hellewell equal: 1 - id: sam_abbott equal: 1 - id: nick_golding - id: hamish_gibbs - id: chris_jarvis - id: kevin_vanzandvoort - id: ncov-group - id: stefan_flasche - id: roz_eggo - id: john_edmunds - id: adam_kucharski ---

Aim

To estimate the percentage of symptomatic COVID-19 cases reported in different countries using case fatality ratio estimates based on data from the ECDC, correcting for delays between confirmation-and-death.

Methods Summary

Current estimates for percentage of symptomatic cases reported for countries with greater than ten deaths

Temporal variation

Figure 1: Temporal variation in reporting rate. We calculate the percentage of symptomatic cases reported on each day a country has had more than ten deaths. We then fit a Gaussian Process (GP) to these data (see Temporal variation model fitting section for details), highlighting the temporal trend of each countries reporting rate. The red shaded region is the 95% CrI of fitted GP.

Adjusted symptomatic case estimates

Figure 2: Estimated number of new symptomatic cases, calculated using our temporal under-reporting estimates. We adjust the reported case numbers each day - for each country with an under-reporting estimate - using our temporal under-reporting estimates to arrive at an estimate of the true number of symptomatic cases each day. The shaded blue region represents the 95% CrI, calcuated directly using the 95% CrI of the temporal under-reporting estimate.

Reported cases

Figure 3: Reported number of cases each day, pulled from the ECDC and plotted against time for comparison with our estimated true numbers of symptomatic cases each day, adjusted using our under-reporting estimates.

Table of current estimates

Country Percentage of symptomatic cases reported (95% CI) Total cases Total deaths
Afghanistan 68% (51%-87%) 12,456 227
Albania 81% (35%-100%) 1,050 33
Algeria 30% (20%-42%) 8,857 623
Andorra 22% (11%-39%) 763 51
Argentina 32% (25%-42%) 13,920 500
Armenia 72% (52%-94%) 7,774 98
Australia 86% (53%-100%) 7,139 103
Austria 63% (37%-92%) 16,515 645
Azerbaijan 86% (63%-100%) 4,568 54
Bahamas 56% (15%-100%) 100 11
Bahrain 98% (89%-100%) 9,633 15
Bangladesh 66% (49%-89%) 38,292 544
Belarus 99% (92%-100%) 38,956 214
Belgium 24% (18%-30%) 57,592 9,364
Bolivia 24% (19%-32%) 7,768 280
Bosnia and Herzegovina 20% (11%-37%) 2,435 151
Brazil 19% (16%-23%) 411,821 25,598
Bulgaria 23% (17%-33%) 2,460 133
Burkina Faso 34% (17%-67%) 845 53
Cameroon 51% (29%-82%) 5,436 177
Canada 19% (15%-23%) 87,508 6,765
Chad 36% (19%-59%) 715 64
Chile 75% (58%-94%) 82,289 841
China 98% (98%-100%) 84,106 4,638
Colombia 38% (30%-48%) 24,104 803
Congo 40% (18%-79%) 571 19
Costa Rica 69% (27%-100%) 984 10
Cote dIvoire 88% (63%-100%) 2,556 31
Croatia 23% (12%-40%) 2,244 101
Cuba 56% (30%-90%) 1,974 82
Cyprus 85% (51%-100%) 939 17
Czechia 36% (27%-48%) 9,086 317
Democratic Republic of the Congo 60% (37%-91%) 2,659 68
Denmark 61% (42%-87%) 11,480 565
Dominican Republic 79% (63%-94%) 15,723 474
Ecuador 19% (11%-27%) 38,103 3,275
Egypt 37% (28%-46%) 19,666 816
El Salvador 62% (39%-92%) 2,109 39
Estonia 36% (22%-59%) 1,840 66
Finland 84% (45%-100%) 6,692 313
France 15% (11%-18%) 145,746 28,596
Gabon 92% (71%-100%) 2,319 14
Georgia 75% (36%-100%) 735 12
Germany 33% (26%-41%) 179,717 8,411
Ghana 97% (88%-100%) 7,303 34
Greece 23% (13%-34%) 2,903 173
Guatemala 48% (34%-68%) 4,145 68
Guernsey 46% (14%-97%) 252 13
Guinea 94% (77%-100%) 3,446 21
Guyana 52% (13%-99%) 139 11
Haiti 42% (20%-75%) 1,320 34
Honduras 34% (23%-48%) 4,640 194
Hungary 12% (8%-17%) 3,816 509
Iceland 87% (54%-100%) 1,805 10
India 39% (32%-46%) 158,333 4,531
Indonesia 23% (18%-29%) 23,851 1,473
Iran 53% (43%-63%) 141,591 7,564
Iraq 24% (16%-36%) 5,135 175
Ireland 35% (23%-52%) 24,803 1,631
Isle of Man 31% (9.1%-91%) 336 24
Israel 81% (59%-99%) 16,793 281
Italy 14% (11%-17%) 231,139 33,072
Japan 11% (7.8%-15%) 16,651 858
Jersey 14% (6.4%-35%) 308 29
Kazakhstan 98% (89%-100%) 9,576 37
Kenya 45% (21%-88%) 1,471 55
Kosovo 60% (34%-97%) 1,047 30
Kuwait 95% (82%-100%) 23,267 175
Kyrgyzstan 84% (54%-100%) 1,594 16
Latvia 52% (27%-91%) 1,057 23
Lebanon 76% (43%-100%) 1,161 26
Liberia 21% (8%-56%) 266 27
Lithuania 26% (12%-42%) 1,647 66
Luxembourg 48% (29%-69%) 4,001 110
Malaysia 96% (77%-100%) 7,619 115
Mali 17% (11%-25%) 1,116 70
Mauritius 62% (18%-100%) 334 10
Mexico 9.2% (7.5%-11%) 78,023 8,597
Moldova 30% (23%-39%) 7,537 274
Morocco 98% (83%-100%) 7,601 202
Netherlands 24% (18%-31%) 45,768 5,871
New Zealand 52% (25%-93%) 1,154 22
Niger 15% (7.7%-26%) 955 64
Nigeria 46% (33%-62%) 8,733 254
North Macedonia 23% (15%-33%) 2,040 119
Norway 73% (43%-98%) 8,383 235
Oman 97% (86%-100%) 8,373 38
Pakistan 65% (53%-79%) 61,227 1,260
Panama 56% (40%-74%) 11,728 315
Paraguay 91% (66%-100%) 884 11
Peru 36% (29%-43%) 135,905 3,983
Philippines 42% (30%-56%) 15,049 904
Poland 43% (33%-54%) 22,473 1,028
Portugal 30% (22%-39%) 31,292 1,356
Puerto Rico 64% (43%-90%) 3,397 129
Qatar 90% (50%-100%) 48,947 30
Romania 25% (19%-34%) 18,594 1,219
Russia 67% (55%-80%) 370,680 3,968
San Marino 81% (36%-100%) 667 42
Saudi Arabia 99% (93%-100%) 78,541 425
Senegal 83% (57%-100%) 3,253 39
Serbia 94% (72%-100%) 11,275 240
Sierra Leone 18% (11%-31%) 782 45
Singapore 91% (59%-100%) 32,876 23
Sint Maarten 12% (4.1%-32%) 77 15
Slovakia 70% (41%-99%) 1,515 28
Slovenia 19% (11%-34%) 1,471 107
Somalia 56% (31%-86%) 1,731 67
South Africa 34% (24%-44%) 25,937 552
South Korea 50% (22%-87%) 11,344 269
Sudan 25% (17%-35%) 4,146 184
Sweden 15% (11%-19%) 35,088 4,220
Switzerland 25% (19%-33%) 30,678 1,647
Tajikistan 86% (62%-100%) 3,100 46
Thailand 77% (49%-100%) 3,054 57
Togo 67% (30%-100%) 395 13
Tunisia 47% (17%-95%) 1,051 48
Turkey 77% (62%-90%) 158,762 4,397
Ukraine 44% (32%-59%) 21,584 644
United Arab Emirates 99% (93%-100%) 31,969 255
United Kingdom 24% (20%-28%) 267,240 37,460
United Republic of Tanzania 55% (28%-94%) 509 21
United States of America 34% (28%-40%) 1,699,933 100,442
Uruguay 43% (21%-81%) 803 22
Uzbekistan 94% (77%-100%) 3,333 14
Venezuela 90% (64%-100%) 1,245 11
Yemen 4.2% (2.7%-6.2%) 255 53

Table 1: Estimates for the proportion of symptomatic cases reported in different countries using cCFR estimates based on case and death timeseries data from the ECDC. Total cases and deaths in each country is also shown. Confidence intervals calculated using an exact binomial test with 95% significance.

Adjusting for outcome delay in CFR estimates

During an outbreak, the naive CFR (nCFR), i.e. the ratio of reported deaths date to reported cases to date, will underestimate the true CFR because the outcome (recovery or death) is not known for all cases [5]. We can therefore estimate the true denominator for the CFR (i.e. the number of cases with known outcomes) by accounting for the delay from confirmation-to-death [1].

We assumed the delay from confirmation-to-death followed the same distribution as estimated hospitalisation-to-death, based on data from the COVID-19 outbreak in Wuhan, China, between the 17th December 2019 and the 22th January 2020, accounting right-censoring in the data as a result of as-yet-unknown disease outcomes (Figure 1, panels A and B in [7]). The distribution used is a Lognormal fit, has a mean delay of 13 days and a standard deviation of 12.7 days [7].

To correct the CFR, we use the case and death incidence data to estimate the proportion of cases with known outcomes [1,6]:

\[ u_{t} = \frac{ \sum_{j = 0}^{t} c_{t-j} f_j}{c_t}, \]

where \(u_t\) represents the underestimation of the proportion of cases with known outcomes [1,5,6] and is used to scale the value of the cumulative number of cases in the denominator in the calculation of the cCFR, \(c_{t}\) is the daily case incidence at time, \(t\) and \(f_t\) is the proportion of cases with delay of \(t\) between confirmation and death.

Approximating the proportion of symptomatic cases reported

At this stage, raw estimates of the CFR of COVID-19 correcting for delay to outcome, but not under-reporting, have been calculated. These estimates range between 1% and 1.5% [1–3]. We assume a CFR of 1.4% (95% CrI: 1.2-1.7%), taken from a recent large study [3], as a baseline CFR. We use it to approximate the potential level of under-reporting in each country. Specifically, we perform the calculation \(\frac{1.4\%}{\text{cCFR}}\) of each country to estimate an approximate fraction of cases reported.

Temporal variation model fitting

We estimate the level of under-reporting on every day for each country that has had more than ten deaths. We then fit a Gaussian Process (GP) model using the library greta and greta.gp. The parameters we fit and their priors are the following: \[ \begin{aligned} &\sigma \sim \text{Log Normal(-1, 1)}: \quad &\text{Variance of the reporting kernel} \\ &\text{L} \sim \text{Log Normal(4, 0.5)}: \quad &\text{Lengthscale of the reporting kernel} \\ &\sigma_{\text{obs}} \sim \text{Truncated Normal(0, 0.5)}, \quad &\text{Variance of the obseration kernel, truncated at 0} \end{aligned} \] The kernel is split into two components: the reporting kernel \(R\), and the observation kernel \(O\). The reporting component has a standard squared-exponential form. For the observation component, we use an i.i.d. noise kernel to acccount for observation overdispersion, which can smooth out overly clumped death time-series. This is important as some countries have been known to report an unusually large number of deaths on a single day, due to past under-reporting.

In the sampling and fitting process, we calculate the expected number of deaths at each time-point, given the baseline CFR. We then use a Poisson likelihood, where the expected number of deaths is the rate of the Poisson likelihood, given the observed number of deaths

Adjusting case counts for under-reporting

We adjust the reported number of cases each day, pulled from the ECDC. Specifically, we divide the case numbers of each day by our “proportion of cases reported” estimates that we calculate each day for each country.*

Limitations

Implicit in assuming that the under-reporting is \(\frac{1.4\%}{\text{cCFR}}\) for a given country is that the deviation away from the assumed 1.4% CFR is entirely down to under-reporting. In reality, burden on healthcare system is a likely contributing factor to higher than 1.4% CFR estimates, along with many other country specific factors.

The following is a list of the other prominent assumptions made in our analysis:

Code and data availability

The code is publically available at https://github.com/thimotei/CFR_calculation. The data required for this analysis is a time-series for both cases and deaths, along with the corresponding delay distribution. We scrape this data from ECDC, using the NCoVUtils package [8].

Acknowledgements

The authors, on behalf of the Centre for the Mathematical Modelling of Infectious Diseases (CMMID) COVID-19 working group, wish to thank DSTL for providing the High Performance Computing facilities and associated expertise that has enabled these models to be prepared, run and processed and in an appropriately-rapid and highly efficient manner.

References

1 Russell TW, Hellewell J, Jarvis CI et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv 2020.

2 Verity R, Okell LC, Dorigatti I et al. Estimates of the severity of covid-19 disease. medRxiv 2020.

3 Guan W-j, Ni Z-y, Hu Y et al. Clinical characteristics of coronavirus disease 2019 in china. New England Journal of Medicine 2020.

4 Shim E, Mizumoto K, Choi W et al. Estimating the risk of covid-19 death during the course of the outbreak in korea, february-march, 2020. medRxiv 2020.

5 Kucharski AJ, Edmunds WJ. Case fatality rate for ebola virus disease in west africa. The Lancet 2014;384:1260.

6 Nishiura H, Klinkenberg D, Roberts M et al. Early epidemiological assessment of the virulence of emerging infectious diseases: A case study of an influenza pandemic. PLoS One 2009;4.

7 Linton NM, Kobayashi T, Yang Y et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. Journal of Clinical Medicine 2020;9:538.

8 Abbott S MJ Hellewell J. NCoVUtils: Utility functions for the 2019-ncov outbreak. doi:105281/zenodo3635417 2020.